10 research outputs found

    Bioinformatics tools @ NBBNet: online infrastructure for the management and analysis of biological data

    Get PDF
    The use of informatics tools for the management and analysis of sequences for nucleic acids and proteins has resulted better throughout capability of wet lab research work to infer biological data to functional biological information. The field of computational biological information management and analysis is generally known as bioinformatics. We discuss some tools and processes which have been developed or integrated into a data management and information presentation pipeline by the Malaysian National Biotechnology and Bioinformatics Network. Central to this is the Bioinformatics Tools @ NBBnet online infrastructure system. This infrastructure system utilizes grid computing technology. In addition, the deployment of niche databases and database shells for research applying specific datasets such as a particular protein function, protein family or genomes have been discussed

    The effects of dictionary and lexicon to morphological analysis

    Get PDF
    Dictionary or lexicon is one of the key components for morphological analyzers, except to purely rule-based tools. However, only few studies had been performed to evaluate the effects of using dictionary or lexicon for Malay morphological analysis. In this paper, we report the result of such evaluation on our morphological analyzer called MALIM, based on auto-expansion simulation. We found that lexicon improves the performance of the analyzer in term of correctness, and also the quality of the output

    An effective security alert mechanism for real-time phishing tweet detection on twitter

    Get PDF
    Phishing is a form of social engineering crime uses to deceive victims by directing them to a fraudulent website where their private and confidential information are collected for further illegal actions. Phishing attacks have now targeted users at Online Social Networks (OSN)s such as Twitter, Facebook, Myspace, etc. which traditionally, targeting email users. Twitter has become so prevalent to phishers to spread phishing attacks nowadays due to its vast information dissemination and difficult to be detected unlike email. As such, the effectiveness of security alert to prompt Twitter users for the tweet containing phishing Uniform Resource Locator (URL) in real-time is crucial. Many solutions have been proposed but their effectiveness are inadequate and doubtful. In this paper, we propose an effective security alert mechanism making use of a classification model derived from a supervised machine learning technique of Random Forest (RF) and the identified 11 best classification features yielded 94.75% accuracy higher than 94.56% yielded by other researchers who used more than 11 features trained on the same dataset collected from Twitter. To determine its effectiveness, we used 200 phishing URLs collected from Twitter and PhishTank respectively. From our experiment, we are able to justify that such proposed security alert mechanism managed to prompt 97.50% effectively the security alert to Twitter users in real-time

    MALIM - a new computational approach of Malay morphology

    Get PDF
    Malay is categorized as an Austronesian language, a group which also contains Bahasa Indonesia and Tagalog. Quite number of morphological analyzers has been developed for Malay, including based on two-level formalism, stemming/conflation model, or even specific model. The obvious weaknesses are incompleteness and incapability of handling ambiguity which affect the accuracy of analysis. So we introduced a new technique called S-A-P-I to handle them in our analyzer - MALIM. In this paper we describe about MALIM and the empirical study to its usage. Our results proved that by using our technique increase the accuracy of morphological analysis up to 98.99% which covers 99.99% of sample data. Thus we believe this approach is the most suitable in handling morphological analysis for Malay

    Bioinformatics Tools @ NBBNet: Online Infrastructure for the Management and Analysis of Biological Data

    Get PDF
    ABSTRACT The use of informatics tools for the management and analysis of sequences for nucleic acids and proteins has resulted better throughout capability of wet lab research work to infer biological data to functional biological information. The field of computational biological information management and analysis is generally known as bioinformatics. We discuss some tools and processes which have been developed or integrated into a data management and information presentation pipeline by the Malaysian Nationa

    Lexicon splitting in lexical disambiguation for Malay morphological analysis and stemming

    Get PDF
    Lexical ambiguity is one of the problems faced by morphological analyser and stemmer. It is caused by ambiguous word form like homonym, which could direct the tools to produce incorrect output. Thus a method that can resolve ambiguity may improve the performance of such tools. Malay word affixation differentiates between monosyllable and multisyllable word. A disambiguation method is proposed for tools that use lexicon for analysis and stemming, by splitting the lexicon into monosyllable and multisyllable words. We found that this feature could help to resolve ambiguity involving monosyllable words, improve language’s exception handling and improve storage lookup.This would be useful for Malay morphological analysis and stemming as this method does not require document-level context analysis of the analysed word

    The development of Perl Script distribution center portal at Malaysian Genome Institute (MGI)

    Get PDF
    ‘Perl Reference and Distribution Center’ portal system was developed as a web based system. The objective of this system is to help researchers in understanding Perl language, besides encouraging them to use Perl in their research. Perl is an acronym for Practical Extraction and Report Language, have the advantage in term of its syntax, since it was based on scripting and interpreted language. Perl script can be developed within a short period, plus with its open source feature, Perl had gained a lot of supports. Perl also have a huge library, and also used a lot in Bioinformatics’ field. Bioinformatics involved a lot use of software for data analysis tools. A lot of transactions and data manipulation has become a common requirement in bioinformatics, and they are better fulfilled by Perl. The main problem focused is the incapability of researchers in MGI to develop custom Perl application for their own purpose, since their attention is more on the biological research. This system tries to help researchers by providing easy access to the information regarding Perl through website system. Developed using WebE as development methodology, this system provides many applications, including tutorial section and a section for downloading application. This system would attract the researchers to Perl, and encourage them to learn the language because all the resources have been grouped in a single place. As a result, the researchers will find it is easier for them to learn Perl, and hopefully can increase the productivity and quality of their research

    Golongan dan rumus kata gandaan berima.

    No full text
    Kata gandaan berima merupakan satu golongan kata dalam bahasa Melayu. Namun, terdapat masalah berkaitan ciri penggolongan dan pengenalpastian pembentukannya. Dalam makalah ini kata gandaan berima dikaji dan beberapa ciri khusus definisinya diketengahkan. Selain itu, aspek pembentukan strukturnya dijelaskan dengan menggunakan model Morpheme Doubling Theory (MDT) berserta hipotesis “acuan makna”. Tiga rumus pembentukan gandaan rima dijelaskan secara terperinci. Melalui kajian ini, empat golongan kata gandaan berima dikenal pasti melibatkan dua golongan baharu berdasarkan rumus. Kesimpulannya, golongan dan penggandaan berima boleh diperjelas secara lebih terperinci, dengan kekangan pada sesetengah aspek

    Formal properties and characteristics of Malay rhythmic reduplication

    Get PDF
    Reduplication is a process of Malay word formation. Generally it can be divided into full, partial and rhythmic reduplication. Another set is the undefined group which is called free-form reduplication. Lexically, the rhythmic reduplication is systematically formed as some of its processes had been able to be identified. The semantical aspect of reduplication is to create ‘multiplicity’, ‘repetition’, ‘concentration’ and ‘variety’. Of these, the rhythmic reduplication is specialized with the latter. Functionally, rhythmic reduplication tends to diversify a set where others may just perform multiplication. Our analysis shows that rhythmic reduplication does exist with purpose. Its systematic formation and usage shows that Malay language had its own device to classify things which rigorously describe different sets of object or their properties

    Bioinformatics Tools @ NBBNet: Online Infrastructure for the Management and Analysis of Biological Data

    No full text
    ABSTRACT The use of informatics tools for the management and analysis of sequences for nucleic acids and proteins has resulted better throughout capability of wet lab research work to infer biological data to functional biological information. The field of computational biological information management and analysis is generally known as bioinformatics. We discuss some tools and processes which have been developed or integrated into a data management and information presentation pipeline by the Malaysian Nationa
    corecore